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Abstract 

Data mining provides large benefits to the commercial, government and homeland security 
sectors, but the aggregation and storage of huge amounts of data about citizens inevitably leads 
to an erosion of privacy. To achieve the benefits that data mining has to offer, while at the 
same time enhancing privacy, we need technological solutions that simultaneously enable data 
mining while preserving privacy. This need has been recognized by the US government, as can 
be seen in the February 2008 report on data mining by the Office of the Director of National 
Intelligence (see pages 9-12). In this paper, we present surprisingly simple and extraordinarily 
efficient protocols for a number of non-trivial tasks related to privacy-preserving data mining. 
Our protocols use standard smartcards and standard smartcard infrastructure, and are the first 
truly practical solutions for these problems that provide strong security guarantees. 

1 Introduction 

Background — privacy- preserving data mining. It is well known that the privacy of citizens 
has significantly eroded over the last two decades. The advent of huge databases and the tools to 
manipulate the data in those databases (e.g., via data mining) means that an enormous amount of 
information about citizens is in the hands of governments and commercial entities. This information 
is being used for many social and commercial purposes, and also in the interests of homeland security 
where this latter use is believed to be one of the prime sources of the erosion of citizens' privacy 
(at least in the USA). 

There are two main approaches regarding how to deal with the problems of privacy that arise 
today. The first is a legal and policy approach whereby organizations are limited in how they store 
and use data based on privacy law and public policy. It typically works by evaluating scenarios 
and deciding if the privacy breach caused by using the data in a given way is justified or not. The 
second approach is technological, and provides enforced privacy guarantees through cryptographic 
means. This approach has the capability of enabling the data to be used while preventing (or at 
least minimizing) privacy breaches. Thus, technological solutions can actually enable greater data 
utilization, while providing strong privacy guarantees. Furthermore, technological solutions enforce 
good behavior and so prevent privacy breaches even in the presence of malicious entities. This is 
in contrast to the legal and policy approach, which relies on the deterrence of penalties. 
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To demonstrate the technological approach, consider a government project for improving social 
services by studying the health status of people on social welfare. The importance of such a project 
is clear, but it requires sharing sensitive health information from hospitals and HMOs with the 
government, and cross referencing it with social welfare records, so obviously infringing upon the 
privacy of patients in an unacceptable manner. Cryptographic protocols from the field of secure 
multiparty computation (see below) can solve this problem and allow the computation to be carried 
out without actually sharing the data. 1 Each participating organization learns only the output, 
and no one learns anything about the databases of the other organizations. Furthermore, privacy 
is guaranteed even if one or more of the agencies attempt to learn more than they are allowed. 

Although secure multiparty computation can be used to solve the aforementioned problems, 
these cryptographic protocols are typically very expensive computationally, and are far from efficient 
enough to be used in practice. Furthermore, most protocols known for this setting are only secure if 
the adversary is assumed to be semi-honest, meaning that it follows the protocol specification (but 
just tries to learn more information than it should from the protocol transcript). This assumption 
is unrealistic in many - if not most - applications of privacy-preserving data mining. We stress that 
even under this assumption, most known protocols are not efficient enough to be used in practice on 
large data sets. The problem becomes even more acute when considering malicious adversaries who 
can follow any arbitrary strategy to attack the protocol. In this case, very few efficient protocols 
exist, We conclude that although cryptography can theoretically be used to solve these privacy 
problems, such solutions are still far from being practical. 

The US government has expressed interest in solutions of this type, as is explicitly stated in 
the Data Mining Report (February 2008) of the Office of the Director of National Intelligence. The 
following is a quote from page 1 1 of that report (the context is a request for research proposals on 
the topic): 

Technology areas of particular interest include (but are not limited to) the following: 

• Secure multi-party function evaluation. While the mathematics of this technology has 
been studied for some time, practical applications have been lacking. Projects that 
can demonstrate how this technology could be applied to problems of realistic scale 
and complexity will be of interest. For example, agencies at different levels of the 
U.S. government, as well as selected foreign government and private sector enti- 
ties, are all interested in comparing intelligence information concerning terrorist 
financing, yet these entities may be unwilling or unable to disclose their own de- 
tailed information for fear of violating privacy rules or compromising sources and 
methods. Secure multi-party function evaluation might provide a way for such en- 
tities to cooperate in computing the results regarding such financial flows without 
either sharing the information with each other or resorting to a trusted third party 
to compute it for them. 

Background — secure multiparty computation. In the setting of secure multiparty computa- 
tion, a set of parties with private inputs wish to jointly compute some functionality of their inputs. 
Loosely speaking, the security requirements of such a computation are that (i) nothing is learned 
from the protocol other than the output (privacy), (ii) the output is distributed according to the 
prescribed functionality (correctness), and (iii) parties cannot make their inputs depend on other 
parties' inputs. For example, in a secure election, privacy guarantees that no party's individual vote 
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is learned (rather, only the outcome is revealed), correctness guarantees that the candidate with the 
most votes is the one that wins the election, and independence of inputs essentially means that an 
individual's vote is cast without any information about how others have voted. Secure multiparty 
computation forms the basis for a multitude of tasks, including those as simple as coin-tossing and 
agreement, and as complex as electronic voting and auctions, electronic cash schemes, anonymous 
transactions, remote game playing (a.k.a. "mental poker"), and privacy-preserving data mining. 

The security requirements in the setting of multiparty computation must hold even when some 
of the participating parties are adversarial. In this paper, we consider malicious adversaries that can 
arbitrarily deviate from the protocol specification. It has been shown that, with the aid of suitable 
cryptographic tools, any two-party or multiparty function can be securely computed [25, 14, 13, 4, 7] 
in the presence of malicious adversaries. However, protocols that achieve this level of security are 
rarely efficient enough to be used in practice, even for relatively small inputs. 

Recently, there has been much interest in the data mining and other communities for secure 
protocols for a wide variety of tasks. This interest exists not only in academic circles, but also 
in industry and government, in part due to the growing conflict between the privacy concerns of 
citizens and the homeland security needs of governments. Unfortunately, however, truly practical 
protocols that also achieve proven security are currently far out of reach. This is especially the 
case when security in the presence of malicious adversaries is considered (see related work for other 
models). 

Smartcard-aided secure computation. In this paper, we construct protocols that use smart- 
cards in addition to standard network communication. Specifically, in addition to sending messages 
over a network, the participating parties may initialize smartcards in some way and send them to 
each other. Of course, such a modus operandi is only reasonable if this is not over-used. In all of our 
protocols, one party initializes a smartcard and sends it to the other, and that is all. Importantly, it 
is also sufficient to send a smartcard once, which can then be used for many executions of the pro- 
tocol (and even for different protocols). This model is clearly not suitable for protocols that must 
be run by ad hoc participants over the Internet (e.g., for secure eBay auctions or secure Internet 
purchases). However, we argue that it is suitable whenever parties with non-transient relationships 
need to run secure protocols. Thus, this model is suitable for the purpose of privacy-preserving data 
mining between commercial, governmental and security agencies. We construct practical two-party 
protocols for the following tasks: 

• Secure set intersection: This problem is of great interest in practice and has many applications. 
Some examples are: finding out if someone is on two security agencies' list of suspects, finding 
out if someone illegally receives social welfare from two different agencies, finding out what 
patients receive medical care at two different medical centers, and so on. This problem has 
received a lot of attention due to its importance; see [22, 11, 18] for some examples. We present 
a protocol that is far more efficient than any known current solutions, and provides the highest 
level of security. Our protocol is surprisingly simple, and essentially requires one party to carry 
out one 3DES or AES computation on each set element (using a regular PC), while the other 
party carries out the same computations using a smartcard. Thus, for sets comprised of 30,000 
elements, the first party's computation takes approximately 20 seconds and the second party's 
computation takes approximately 30 minutes (but can be parallelized, meaning that using 10 
smartcards, the computation would take approximately 3 minutes). In our protocol, only the 
second party receives output. 



• Oblivious database search: In this problem, a client is able to search a database held by a 
server so that: (a) the client can only carry out a single search (or a predetermined number 
of searches authorized by the server), and learns nothing beyond the result of the authorized 
searches; and (b) the server learns nothing about the searches carried out by the client. We 
remark that searches are as in the standard database setting: the database has a "key attribute" 
and each record has a unique key value; searches are then carried out by inputting a key value 
- if the key exists in the database then the client receives back the entire record; otherwise 
it receives back a "non-existent" reply. This problem has been studied in [8, 10] and has 
important applications to privacy. For example, consider the case of homeland security where 
it is sometimes necessary for one organization to search the database of another. In order 
to minimize information flow (or stated differently, in order to preserve the "need to know" 
principle), we would like the agency carrying out the search to have access only to the single 
piece of information it is searching for. Furthermore, we would like the value being searched for 
to remain secret. Another, possibly more convincing, application comes from the commercial 
world. The LexisNexis database is a paid service provided to legal professionals that enables 
them - among other things - to search legal research and public records, for the purpose of 
case preparation. Now, the content of searches made for case preparation is highly confidential; 
this information reveals much about the legal strategy of the lawyers preparing the case, and 
would allow the other side to prepare counter-arguments well ahead of time. It is even possible 
that revealing the content of some of these searches may breach attorney-client privilege. We 
conclude that the searches made to LexisNexis must remain confidential, and even LexisNexis 
should not learn them (either because they may be corrupted, or more likely, a breach to their 
system could be used to steal this confidential information). Oblivious database search can 
be used to solve this exact problem. We present a protocol for oblivious database search that 
reaches a level of efficiency that is almost equivalent to a non-private database search. Once 
again, we achieve provable security in the presence of malicious adversaries. 

• Oblivious document search: A similar but seemingly more difficult problem to that of oblivious 
database search is that of oblivious document search. Here, the database is made up of a series 
of unstructured documents and a keyword query should return all documents that contain that 
query. This is somewhat more difficult than the previous problem because of the dependence 
between documents (the client should not know if different documents contain the same keyword 
if it has not searched them both). Nevertheless, using smartcards, we present a highly efficient 
protocol for this problem, that is provably secure in the presence of malicious adversaries. We 
remark that in many cases, including the LexisNexis example above, what is really needed is 
the unstructured document search here. 

We provide rigorous proofs of security for all of our protocols, under the most stringent definitions 
of security for the case of malicious adversaries that may follow any arbitrary polynomial-time 
strategy (cf. [5, 13] following [15, 3, 21]). Thus, the highest level of security is achieved. As we 
have mentioned, however, we use a smartcard to aid in the computation, unlike the standard model 
of computation. As will become clear, this gives extraordinary power and makes it possible to 
construct protocols that are far more efficient than anything previously known. It is important 
to note that by using smartcards we are opening up the system to additional attacks. 2 However, 
we believe that this is a reasonable tradeoff, especially since without such "help", no reasonable 
solutions exist. See more discussion on this issue below. 



2 The security of cryptographic protocols all assume that the honest party's systems are not compromised, and 
that the software being run accurately follows the instructions of the protocol. In addition, we now also need to 
assume that the smartcards being used cannot be broken into. 



Standard smartcards — what and why. We stress that our protocols are designed so that any 
standard smartcard can be used. Before proceeding we explain why it is important for us to use 
standard - rather than special-purpose - smartcards, and what functionality is provided by such 
standard smartcards. The reason for our insistence on standard smartcards is twofold: 

1. Ease of deployment: It is much easier to actually deploy a protocol that uses standard 
smartcard technology. This is due to the fact that many organizations have already deployed 
smartcards, typically for authenticating users. However, even if this is not the case, it is 
possible to purchase any smartcard from essentially any smartcard vendor. 3 

2. Trust: If a special-purpose smartcard needs to be used for a secure protocol, then we need to 
trust the vendor who built the smartcard. This trust extends to believing that they did not 
incorrectly implement the smartcard functionality on purpose or unintentionally. In contrast, 
if standard smartcards can be used then it is possible to use smartcards constructed by a 
third-party vendor (and possibly constructed before our protocols were even designed). In 
addition to reducing the chance of malicious implementation, the chance of an unintentional 
error is much smaller, because these cards have been tried and tested over many years. 

We remark that Javacards can also be considered for the application that we are considering. 
Javacards are smartcards with the property that special-purpose Java applets can be loaded onto 
them in order to provide special-purpose functionality. Such solutions are also reasonable. However, 
it does make deployment slightly more difficult as already-deployed smartcards (that are used for 
smartcard logon and VPN authentication for example) cannot be used. Furthermore, it is necessary 
to completely trust whoever wrote the applet; this can be remedied by having an open-source 
applet which can be checked before loaded. Therefore, protocols that do need smartcards with 
some special-purpose functionality can be used, but are slightly less desirable. 

A trusted party? At first sight, it may seem that we have essentially introduced a trusted party 
into the model, and so of course everything becomes easy. We argue that this is not the case. First, 
a smartcard is a very specific type of trusted party, with very specific functionality (especially if 
we focus on standard smartcards). Second, due to it being weak hardware, a smartcard cannot 
carry out a computation on large inputs. Thus, even a special-purpose smartcard cannot directly 
compute set intersection on inputs of size 30,000. Finally, smartcards are used in practice and are 
becoming more and more ubiquitous. Thus, our model truly is a realistic one, and our protocols 
can easily be deployed in practice. 

Trusting smartcards. The security of our protocols is preserved as long as the smartcards used 
are not broken. We base this assumption on the fact that modern smartcards are widely deployed 
today - mostly for authentication - and are rarely broken (we stress that we refer to smartcards 
that have passed certification like FIPS or Common Criteria, and not microprocessors with basic 
protection). Of course, we are referring to smartcards that provide a high level of physical security, 
and not simple microcontrollers. Great progress has been made over the years to make it very hard 
to access the internal memory of a smartcard. Typical countermeasures against physical attacks on 
a smartcard include: shrinking the size of transistors and wires to 200nm (making them too small 
for analysis by optical microscopes and too small for probes to be placed on the wires), multiple 
layering (enabling sensitive areas to be buried beneath other layers of the controller), protective 

Of course, the notion of a "standard" smartcard is somewhat problematic because different vendors construct smartcards 
with different properties. We therefore rely on properties that we know are in the widely-used smartcards sold by Siemens. 



layering (a grid is placed around the smartcard and if this is cut, then the chip automatically 
erases all of its memory), sensors (if the light, temperature etc. are not as expected then again 
all internal memory is immediately destroyed), bus scrambling (obfuscating the communication 
over the data bus between different components to make it hard to interpret without full reverse 
engineering), and glue logic (mixing up components of the controller in random ways to make it 
hard to know what components hold what functionality). For more information, we refer the reader 
to [24] . Having said the above, there is no perfect security mechanism and this includes smartcards 
(likewise, there are better and worse smartcards, some being more vulnerable to attack and others 
needing highly specialized equipment and expertise). Nevertheless, we strongly believe that it is 
a reasonable assumption to trust the security of high-end smartcards (for example, smartcards 
that have FIPS 140-2, level 3 or 4 certification). Our belief is also supported by the computer- 
security industry: smartcards are widely used today as an authentication mechanism to protect 
security-critical applications. 

Smartcard authenticity. As we have mentioned, our protocols require one party to initialize 
a smartcard and send it to the other. Furthermore, the recipient of the smartcard needs to trust 
that the device that it receives is really a smartcard of the specified type. Since our protocols 
rely on standard smartcard technology only, this problem essentially reduces to identifying that 
a given device was manufactured by a specified smartcard vendor. In principle, this problem is 
easily solved by having smartcard manufacturers initialize all devices with a public/private key 
pair, where the private key is known only to the manufacturer (and their devices). Then, given a 
device and the manufacturer's public key it is possible to verify that the device is authentic using a 
simple challenge /response mechanism. This solution is not perfect because given the compromise of 
a single smartcard, it is possible to manufacture multiple forged devices. This is highly undesirable 
because it means that the incentive to carry out such an attack can be very high. This can be 
improved by using different public keys for different batches (or even a different key for every 
device, although this is probably too cumbersome in practice). To the best of our knowledge, 
such a mechanism is typically not implemented today (rather, symmetric keys are used instead). 
Nevertheless, it could be implemented without much difficulty and so is not a serious barrier. 

Related work. Secure computation has been studied at great length for over two decades. How- 
ever, the study of highly-efficient protocols for problems of interest has recently been intensively 
studied under the premise of "privacy-preserving data mining", starting with [20]. Most of the se- 
cure protocols for this setting have considered the setting of semi-honest adversarial behavior, which 
is often not sufficient. Indeed, highly-efficient protocols that are proven secure in the presence of 
malicious adversaries and using the simulation-based approach are few and far between; one notable 
exception being the work of [1] for securely computing the median. Therefore, researchers have 
considered other directions. One possibility is to consider privacy only; see for example [9, 22, 6]. 
A different direction considered recently has been to look at an alternative adversary model that 
guarantees that if an adversary cheats then it will be caught with some probability [2, 16]. We 
stress that our protocols are more efficient than all of the above and also reach a higher level of 
security than most. (Of course, we have the additional requirement of a smartcard and thus a 
comparison of our protocols is not really in place; rather we view this as a comparison of models.) 



2 Standard Smartcard Functionality 

In this section we describe what functionality is provided by standard smartcards. Our description 
does not include an exhaustive list of all available functions. Rather we describe the most basic 
functionality and some additional specific properties that we use: 

1. On-board cryptographic operations: Smartcards can store cryptographic keys for private and 
public-key operations. Private keys that are stored (for decryption or signing/MACing) can 
only be used according to their specified operation and cannot be exported. We note that 
symmetric keys are always generated outside of the smartcard and then imported, whereas 
asymmetric keys can either be imported or generated on-board (in which case, no one can 
ever know the private key). Two important operations that smartcards can carry out are 
basic block cipher operations and CBC-MAC computation. These operations may be viewed 
as pseudorandom function computations, and we will use them as such. The symmetric 
algorithms typically supported by smartcards use 3DES and/or AES, and the asymmetric 
algorithms use RSA (with some also supporting Elliptic curve operations). 

2. Authenticated operations: It is possible to "protect" a cryptographic operation by a logical 
test. In order to pass such a test, the user must either present a password or pass a chal- 
lenge/response test (in the latter case, the smartcard outputs a random challenge and the 
user must reply with a response based on some cryptographic operation using a password or 
key applied to the random challenge). 

3. Access conditions: It is possible to define what operations on a key are allowed and what 
are not allowed. There is great granularity here. For all operations (e.g., use key, delete key, 
change key and so on), it is possible to define that no one is ever allowed, anyone is allowed, 
or only a party passing some test is allowed. We stress that for different operations (like use 
and delete) a different test (e.g., a different password) can also be defined. 

4. Special access conditions: There are a number of special operations; we mention two here. 
The first is a usage counter; such a counter is defined when a key is either generated or 
imported and it says how many times the key can be used before it "expires". Once the key 
has expired it can only be deleted. The second is an access- granted counter and is the same 
as a usage counter except that it defines how many times a key can be used after passing a 
test, before the test must be passed again. For example, setting the access-granted counter 
to 1 means that the test (e.g., passing a challenge/response) must be passed every time the 
key is used. 

5. Secure messaging: Operations can be protected by "secure messaging" which means that all 
data is encrypted and/or authenticated by a private (symmetric) key that was previously 
imported to the smartcard. An important property of secure messaging is that it is possible 
to receive a "receipt" testifying to the fact that the operation was carried out; when secure 
messaging with message authentication is used, this receipt cannot be tampered with by a 
man-in-the- middle adversary. Thus, it is possible for one party to initialize a smartcard and 
send it to another party, with the property that the first party can still carry out secure 
operations with the smartcard without the second party being able to learn anything or 
tamper with the communication in an undetected way. One example where this may be 
useful is that the first party can import a secret key to the smartcard without the second 
party who physically holds the card learning the key. We remark that it is typically possible 



to define a different key for secure messaging that is applied to messages being sent to the 
smartcard and to messages that are received from the smartcard (and thus it is possible to 
have unidirectional secure messaging only). 

6. Store files: A smartcard can also be used to store files. Such files can either be public (meaning 
anyone can read them) or private (meaning that some test must be passed in order to read 
the file). We stress that private keys are not files because such a key can never be read out 
of a smartcard. In contrast a public key is essentially a file. 

We stress that all reasonable smartcards have all of the above properties, with the possible exception 
of the special access conditions mentioned above in item 4. We do not have personal knowledge of 
any smartcard that does not, but are not familiar with all smartcard vendors. We do know that 
the smartcards of Siemens (and others) have these two counters. 

3 Secure Set Intersection 

In this section we show how to securely compute the secure set intersection problem defined by 
F n (X, Y) = X (1Y, where X = {x±, . . . , x ni } and Y = {yi, . . . , y n2 }, and one party receives output 
(while the other learns nothing). We note that the problem of securely computing the function / eq , 
defined as / eq (x, y) = 1 if and only if x = y, is a special case of set intersection. Thus, our protocol 
can also be used to compute / eq with extremely high efficiency. 

The basic idea behind our protocol is as follows. The first party Pi, with input set X = 
{xi, . . . , x ni } initializes a smartcard with a secret key k for a pseudorandom permutation F (i.e., F 
is a block cipher like 3DES or AES). Then, it computes Xp = {Fk(x\), . . . , Pfc(x ni )} and sends Xp 
and the smartcard to the second party. The second party P2, with input Y = {yi, . . . ,y„ 2 } then 
uses the smartcard to compute Fk{yi) for every i, and it outputs every yi for which Ffc(j/j) £ Xp. It 
is clear that Pi learns nothing because it does not receive anything in the protocol. Regarding P2, 
if it uses the smartcard to compute Fk(y) for some y <E X DY, then it learns that y e X, but this 
is the information that is supposed to be revealed! In contrast, for every x e X that for which P2 
does not use the smartcard to compute Fk(x), it learns nothing about x from Xp (because Fk(x) 
just looks like a random value). 

Despite the above intuitive security argument, there are a number of subtleties that arise. First, 
nothing can stop P2 from asking the smartcard to compute Pfc(y) for a huge number of y's (taking 
this to an extreme, if X and Y are social security numbers, then P2 can use the smartcard to 
compute the permutation on all possible social security numbers). We prevent this by having Pi 
initialize the key k on the smartcard with a usage counter set to ri2. Recall that this means that 
the key k can be used at most ri2 times, after which the key can only be deleted. In addition to 
the above, in order to fully prove the security of our protocol, we need to have party P2 compute 
Pfc(y) for all y e Y before Pi sends it Xp (this is a technicality that comes out of the proof). In 
order to achieve this, we have Pi initialize k with secure messaging for authentication using an 
additional key k- m i t ■ This initialization is an association between the key k and the key fcjnit so that 
when a command to delete k is issued to the smartcard, the confirmation by the smartcard that 
this operation took place is authenticated using a message authentication code keyed with k[ n a 
(standard smartcards support such a configuration). Observe that given this initialization, P2 can 
prove to Pi that it has deleted k before Pi sends Xp (note that Pi knows fcinit and so can verify 
that the MAC is correct). 



3.1 The Basic Protocol 

Let F be a pseudorandom permutation with domain {0, l} n and keys that are chosen uniformly 
from {0, l} n (this is for simplicity only). 

Protocol 1 (secure set intersection - P^ only receives output) 

• Inputs: Party P\ has a set of ri\ elements and party P2 has a set of n2 elements; all elements 
are taken from {0, l} n , where n also serves as the security parameter. 

• Auxiliary inputs: Both Pi and P2 are given n\ and n^, as well as the security parameter n. 

• SmartCard Initialization: Party P\ chooses two keys k, /c m i t <— {0, l} n and imports k into a 
smartcard SC for usage as the key to a block cipher (pseudorandom permutation) . P2 sets the 
usage counter of k to be ri2 and defines that the confirmation to DeleteObject is MACed using 
the key k\ n \t- P\ sends SC to P2 (this takes place before the protocol below begins). 4 

• The protocol: 

1. iVs first step: 

(a) Given the smartcard SC, party P2 computes the set Yp = {(y, Fk(y))} y£ Y- 

(b) Next, P2 issues a DeleteObject command to the smartcard to delete k and receives 
back the confirmation from the smartcard. 

(c) P2 sends the delete confirmation to P\. 

2. Pi's step: Pi checks that the DeleteObject confirmation states that the operation was 
successful and verifies the MAC-tag on the response. If either of these checks fail, then P\ 
outputs _L and halts. Else, it computes the set Xp = {Fk(x)} x€ x, sends it to P2 and halts. 

3. P2's second step: P2 outputs the set {y \ Fk{y) £ Xp} and halts. 

The following theorem is formally proven in [17]. 

Theorem 2 Assume that F is a pseudorandom permutation over{0, 1}". Then, Protocoll securely 
computes the function F n (X, Y) = X DY in the presence of malicious adversaries, where only P2 
receives output. 

Reusing the smartcard. Although we argue that it is realistic for parties in non-transient re- 
lationships to send smartcards to each other, it is not very practical for them to do this every time 
they wish to run the protocol. Rather, they should be able to do this only once, and then run 
the protocol many times. This is achieved in a very straightforward way using secure messaging. 
Specifically, Pi initializes the smartcard so that a key for a block cipher (pseudorandom permu- 
tation) can be imported, while encrypted under a secure messaging key k sm . This means that Pi 
can begin the protocol by importing a new key k to the smartcard (with usage counter 712 for the 
size of the set in this execution and protected with &i mt for delete as above) . This means that Pi 
only needs to send a smartcard once to P2 and the protocol can be run many times, using standard 
network communication only. 

4 We assume that SC is sent via a secure carrier and s 
both honest. This assumption can be removed by protect: 
the password to P 2 after it receives SC. 



3.2 Experimental Results 

We implemented our protocol for set intersection using the eToken smartcard of Aladdin Knowledge 
Systems and received the following results: 
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These results confirm the expected complexity of approximately 50 milliseconds per smartcard 
operation. We remark that no code optimizations were made and the running-time can be further 
improved (although the majority of the work is with the smartcard and this cannot be made faster 
without further improvements in smartcard technology). 



4 Oblivious Database Search 

In this section we study the problem of oblivious database search. The aim here is to allow a 
client to search a database without the server learning the query (or queries) made by the client. 
Furthermore, the client should only be able to make a single query (or, to be more exact, the client 
should only be able to make a search query after receiving explicit permission from the server). 
This latter requirement means that the client cannot just download the entire database and run 
local searches. We present a solution whereby the client downloads the database in encrypted form, 
and then a smartcard is used to carry out a search on the database by enabling the client to decrypt 
a single database record. 

We now provide an inaccurate description of our solution. Denote the ith database record by 
(j>i,Xi), where pi is the value of the search attribute (as is standard, the values pi,...,pn are 
unique). We assume that each pi G {0,l} n , and for some £ each X{ G {0,l} fa (recall that the 
pseudorandom permutation works over the domain {0, 1}™; thus pi is made up of a single "block" 
and Xi is made up of (. blocks). Then, the server chooses a key k and computes U = Fk(pi), 
Ui = Fkiti) and q = E Ui (xt), for every i = 1,...,N. The server sends the encrypted database 
(ti, Ci) to the client, together with a smartcard SC that has the key k. The key k is also protected 
by a challenge/response with a key fctest that only the server knows; in addition, after passing a 
challenge/response, the key k can be used only twice (this is achieved by setting the access-granted 
counter of & to 2; see Section 2). Now, since F is a pseudorandom function, the value ti reveals 
nothing about pi, and the "key" Ui is pseudorandom, implying that Cj is a cryptographically sound 
(i.e., secure) encryption of Xi, that therefore reveals nothing about X{. In order to search the 
database for attribute p, the client obtains a challenge from the smartcard for fc test and sends it to 
the server. If the server agrees that the client can carry out a search, it computes the response and 
sends it back. The client then computes t = Fk{p) and u = Fk(t) using the smartcard. If there 
exists an i for which t = ti, then the client decrypts c% using the key u, obtaining the record Xi as 
required. Note that the server has no way of knowing the search query of the client. Furthermore, 
the client cannot carry out the search without explicit approval from the server, and thus the 
number of searches can be audited and limited (if required for privacy purposes), or a charge can 
be issued (if a pay-per-search system is in place) . 



We warn that the above description is not a fully secure solution. To start with, it is possible for 
a client to use the key k to compute t and t' for two different values p and p'. Although this means 
that the client will not be able to obtain the corresponding records x and/or x', it does mean that 
it can see whether the two values p and p' are in the database (something which it is not supposed 
to be able to do, because just the existence of an identifier in a database can reveal confidential 
information). We therefore use two different keys k\ and fo; k\ is used to compute t and &2 is used 
to compute u. In addition, we don't use u to directly encrypt x and use the smartcard with a third 
key hi (this is needed to enable a formal reduction to the security of the encryption scheme) . 

4.1 The Functionality 

We begin by describing the functionality for the problem of oblivious database search. In the 
case of set-intersection above, it sufficed for us to define a simple function mapping the parties 
inputs to specified outputs. However, here we consider an inherently interactive setting where the 
client carries out multiple searches, one after another. In order to formally define the security of 
a protocol for this problem, we need to define the input/output behavior of the problem being 
solved. This is achieved by describing the algorithmic input/output behavior of the functionality 
as if it is computed by an external entity. We stress that in reality there is no external entity and 
this is used only for the sake of defining the desired input/output behavior. We remark that a real 
protocol that is run between the parties over the network is secure if its output is essentially the 
same as that of the functionality (as described), even if some of the parties maliciously attack the 
protocol. The functionality that we define is interactive: the server Pi first sends the database and 
the client can then carry out searches. We stress that the client can choose its queries adaptively, 
meaning that it can choose what keywords to search for after it has already received the output 
from previous queries. However, each query must be explicitly allowed by the server (this allows 
the server to limit queries or to charge per query). We first present a basic functionality and then 
a more sophisticated one: 



The Oblivious Database Search Functionality J^bas 



•^"basicDB works with a server Pi and a client P 2 as follows (the variable in it is initially set to 0): 

Initialize: Upon receiving from Pi a message (init, (pi, x\), . . . , (pni%n))i if in it = 0, functionality 
•^basicDB sets init = 1, stores all pairs and sends (init, TV) to P 2 . If init = 1, then PbasicDB ignores 
the message. 

Search: Upon receiving a message retrieve from P 2 , functionality PbasicDB checks that init = 1 and if 
not it returns notlnit. Otherwise, it sends retrieve to Pi. If Pi replies with allow then .FbasicDB 
forwards allow to P 2 . When Pi replies with (retrieve, p), ^"basicDB works as follows: 

1. If there exists an i for which p = p i: functionality J^basicDB sends (retrieve, Xj) to P 2 

2. If there is no such i, then .FbasicDB sends notFound to P 2 . 
If Pi replies with disallow, then .FbasicDB forwards disallow to P 2 . 



Figure 1: The basic oblivious database search functionality 

The main drawback with .FhasicDB is that the database is completely static and updates cannot 
be made by the server. We therefore modify J^basicDB so that inserts and updates are included. 
An insert operation adds a new record to the database, while an update operation makes a change 
to the x portion of an existing record. We stress that in an update, the previous x value is not 
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d, but rather the new value is concatenated to the old one. We define the functionality in this 
way because it affords greater efficiency. Recall that in our protocol, the client holds the entire 
database in encrypted form. Furthermore, the old and new x portions are encrypted with the same 
key. Thus, if the client does not erase the old encrypted x value, it can decrypt it at the same 
time that it is able to decrypt the new x value. Another subtlety that arises is that since inserts 
are carried out over time, and the client receives encrypted records when they are inserted, it is 
possible for the client to know when a decrypted record was inserted. In order to model this, we 
include unique identifiers to records; when a record is inserted, the ideal functionality hands the 
client the identifier of the inserted record. Then, when a search succeeds, the client receives the 
identifier together with the x portion. This allows the client in the ideal model to track when a 
record was inserted (of course, without revealing anything about its content). Finally, we remark 
that our solution does not efficiently support delete commands (this is for the same reason that 
updates are modeled as concatenations). We therefore include a reset command that deletes all 
records. This requires the server to re-encrypt the entire database from scratch and send it to the 
client. Thus, such a command cannot be issued at too frequent intervals. See Figure 2 for the full 
definition of Fdb ■ 

The Oblivious Database Functionality .Pdb 

Functionality .Pdb works with a server Pi and client P 2 as follows (the variable in it is initially set to 0): 

Insert: Upon receiving a message (insert, p, x) from Pi, functionality JFdb checks that there is no 
recorded tuple (id,,p,,Xj) for which p = pi. If there is such a tuple it ignores the message. 
Otherwise, it assigns an identifier id to (p, x), sends (insert, id) to P 2 , and records the tuple 
(id,p,x). 

Update: Upon receiving a message (update, p, x) from Pi, functionality .Pdb checks that there is a 
recorded tuple {idi,pi,Xi) for which p = pi. If there is no such tuple it ignores the message. 
Otherwise it updates the tuple, by concatenating x to Xi. 

Retrieve: Upon receiving a query (retrieve, p) from the client P 2 , functionality .Pdb sends retrieve to 
Pi. If P 2 replies with allow then: 

1. If there exists a recorded tuple (idi,pi,x.i) for which p = p;, then .Pdb sends (idi, Xi) to P 2 . 

2. If there does not exist such a tuple, then .Pdb sends notFound to P 2 . 

Reset: Upon receiving a message reset from Pi, the functionality .Pdb sends reset to P 2 and erases all 
entries. 

Figure 2: A more comprehensive database functionality 
4.2 A Protocol for Securely Computing jFbasicDB 

We first present a protocol for securely computing the basic functionality ^basicDB- Let F be an 
efficiently invertible pseudorandom permutation over {0, l} n with keys that are uniformly chosen 
from {0, l} n (in practice, F is a block cipher like 3DES or AES). We define a keyed function F 
from {0, l} n to {0, l} £n by 

F k (t) = (F k (t + l),F k (t + 2),..., F k {t + £)) 

where addition is modulo 2 n . We remark that F k is a pseudorandom function when the input t is 
uniformly distributed (this actually follows directly from the proof of security in counter mode for 



block ciphers). We assume that all records in the database are exactly of length in (and that this 
is known); if this is not the case, then padding can be used. 

In our protocol, we use a challenge/response mechanism in the smartcard to restrict use of 
cryptographic keys. For the sake of concreteness, we assume that the response to a challenge chall 
with key fetest is -^fctest (chall) where F is a pseudorandom permutation as above. This makes no 
difference, and we define it this way for the sake of concreteness only. 

Protocol 3 (oblivious database search - basic functionality ^"basicDB) 

• Smartcard initialization: Party P\ chooses three keys h\,k2,k$ <— {0,l} n and imports them 
into a smartcard SC for use for a pseudorandom permutation. In addition, Pi imports a key 
ktest as a test object that protects them all by challenge/response. Finally, Pi sets the access- 
granted counter of k\ and &2 to 1, denoted respectively by AGi, AG2, (and sets no access-granted 
counter of k$). See Section 2 for the definition of an access-granted counter. 

Pi sends SC to P2 (this takes place before the protocol below begins). Upon receiving SC, 
party P2 checks that there exist three keys with the properties defined above; if not it outputs _!_ 
and halts. 5 

• The protocol: 

• Initialize: Upon input (init, (pi, x\), . . . , {pn,%n)) for party P\, the parties work as follows: 

1. P\ randomly permutes the pairs (pi,Xi). 

2. For every i, Pi computes ti = F^ipi), m = Pfc 2 (£j) and Ci = i^ 3 (tj) © X{. 

3. P\ sends (u\, ci), . . . , (un, cn) to P2 (these pairs are an encrypted version of the database). 
4- Upon receiving (u\, c±), . . . , (tijv, cat), party P2 stores the pairs and outputs (init, N). 

• Search: Upon input (retrieve, p) for party P2, the parties work as follows: 

1. P2 queries SC for a challenge, receiving chall. P2 sends chall to P\. 

2. Upon receiving chall, if party P\ allows the search it computes resp = Pfc test (chall) and 
sends resp to P2. Otherwise, it sends disallow to P^. 

3. Upon receiving resp, party P2 hands it to SC in order to pass the test. Then: 

(a) P2 uses SC to compute t = Pfcj(p) and u = Fk 2 (t). 

(b) If there does not exist any i for which u = U{, then P2 outputs notFound. 

(c) If there exist an i for which u = ui, party P2 uses SC to compute r = Fk 3 (t); this 
involves £ calls to Fk 3 in SC. Then, P2 sets x = r © a and outputs (retrieve,^). 

Clearly, Pi learns nothing about p2's queries because the only values that Pi sees are random 
challenges issued by the smartcard. Furthermore, P2 can only query the database when Pi explicitly 
allows it (because P2 is unable to past the test without Pi's help). Finally, P2 can only learn a 
single value per query because each row of the database is encrypted using a different t value and 
without a correct first query, it is not possible to find a valid t. A full proof of security is provided 
in [17], demonstrating that the above intuition is sound. We thus have the following theorem: 

Theorem 4 Assume that F is a strong pseudorandom permutation over {0, l} n and let F be as 
defined above. Then, Protocol 3 securely computes ^basicDB- 

is of keys. If not, this will be discovered the first time a search is carried 



4.3 A Protocol for Securely Computing JFdb 

A protocol for securely computing the more sophisticated functionality .Pdb can be derived directly 
from Protocol 3. Specifically, instead of sending all the pairs (ut,Ci) at the onset, P\ sends a new 
pair every time an insert is carried out. In addition, an update just involves Pi re-encrypting the 
new Xi value and sending the new ciphertext d { . Finally, a reset is carried out by choosing new keys 
ki,k2, &3 and writing them to the smartcard (deleting the previous ones). Then, any future inserts 
are computed using these new keys. Of course, the new keys are written to the smartcard using 
secure messaging, as we have described above. 

5 Oblivious Document Search 

In Section 4 we showed how a database can be searched obliviously, where the search is based 
only on a key attribute. Here, we show how to extend this to a less structured database, and in 
particular to a corpus of texts. In this case, there are many keywords that are associated with each 
document and the user wishes to gain access to all of the documents that contain a specific keyword. 
A naive solution would be to define each record value so that it contains all the documents which 
the keyword appears in. However, this would be horrifically expensive because the same document 
would have to be repeated many times. We present a solution where each document is stored 
(encrypted) only once, as follows. 

Our solution uses Protocol 3 as a subprotocol. The basic idea is for the parties to use .PbasicDB 
to store an index to the corpus of texts as follows. The server chooses a random value Si for every 
document Di and then associates with a keyword p the values s; where p appears in the document 
Di. Then, this index is sent to J^basicDB, enabling P2 to search it obliviously. In addition, Pi 
encrypts document Di using a smartcard and s« in the same way that the Xi values are encrypted 
using ti in Protocol 3. Since P2 is only able to decrypt a document if it has the appropriate Si 
value, it can only do this if it queried .FbasicDB with a keyword p that is in document Di. Observe 
that in this way, each document is only encrypted once. 

Let V be the space of keywords of size M, let D\, . . . , Dn denote N text documents, and let 
Pi = {pij} be the set of keywords that appear in Di (note Pj C V). Using this notation, when a 
search is carried out for a keyword p, the client is supposed to receive the set of documents Di for 
which p G Pj. We now proceed to formally define the oblivious document search functionality .Pdoc: 



The Oblivious Document Search Functionality Pdoc 

Functionality Pdoc works with a server Pi and client P 2 as follows (the variable in it is initially set to 0): 

Initialize: Upon receiving from Pi a message (init, V, D±, . . . , Dn), if init = 0, functionality Pdoc sets 
init = 1, stores all documents and V , and sends (init, N, M) to P2, where N is the number of 
documents and M is the size of the keyword set M. If init = 1, then Pdoc ignores the message. 

Search: Upon receiving a message search from P 2 , functionality JF doc checks that init = 1 and if not 
it returns notlnit. Otherwise, it sends search to Pi. If Pi replies with allow then Pdoc forwards 
allow to Pi- When Pi replies with (search, p), Pdoc works as follows: 

1. If there exists an i for which p C P, functionality Pdoc sends (search, {Pj} pe pJ to P 2 . 

2. If there is no such i, then P doc sends notFound to P 2 . 
If P 1 replies with disallow, then P doc forwards disallow to P 2 . 



Figure 3: Oblivious document search via keywords 



Our protocol uses an additional tool of perfectly-hiding commitment scheme denoted by (com, dec) 
that enables a party to commit to a value while keeping it secret (even from all powerful adversary); 
see [12] for a formal definition. We let com(m;r) denotes the commitment to a message m using 
random coins r. For efficiency, we instantiate com(-;-) with Pedersen's commitment scheme [23]. 
Assume, for simplicity, that q — 1 = 2q' for some prime q', and let g, h be generators of a subgroup 
of Z* of order q' . A commitment to m is then defined as com(m;r) = g m h r where r <— r Z g _i. 
The scheme is perfectly hiding as for every m,r,m' there exists r' such that g m h r = g m h r . The 
scheme is binding assuming hardness of computing log 5 h. 

We now present the protocol for securely computing J^doc- Recall that our protocol uses a 
subprotocol computing ^basicDB! f° r clarity we present the protocol referring to .FbasicDB itself. 

Protocol 5 (oblivious document search by keyword) 

• Smartcard initialization: Party P\ chooses a key k <— {0, l} n and imports it into a smartcard 
SC for use for a pseudorandom permutation. P\ sends SC to P2 (this takes place before the 
protocol below begins). 

• The protocol: 

• Initialize: Upon input (init, V, D\, . . . , Djy) to P\, the parties work as follows: 

1. The server P\ initializes a smartcard with a key k for a pseudorandom permutation, and 
sends the smartcard to P2. 

2. P\ chooses random values s±, . . . , sn £# {0, 1}™ (one random value for each document), 
and sends P2 the commitments {com, = com(sj; rj)}^ 1 where ri,...,rjv are random 
strings of appropriate length. 

3. Then, Pi defines a database of M records (pj,Xj) where pj e V is a keyword, and Xj = 
{(i, (si,r , j))}p.g£) i (i.e., Xj is the set of pairs (i, (sj,rj)) where i is such that pj appears 
in document Di). Finally, it encrypts each document Di by computing C% = Pfc(sj) ffi P>i 
(see Section J^.2 for the definition of F). 

4- Pi sends C\, . . . ,Cn to P2, and sends (init, (p\,xi), . . . , (pm,xm)) to J-basicDB- 
5. Upon receiving comi, . . . , comjv and Ci, . . . , CV from P± and (init, M) from J-hasicDB, 
party P2 outputs (init, N, M). 

• Search: Upon input (search, p) to Pi, the parties work as follows: 

1. The client P2 sends (retrieve, p) to J^basicDB and receives back a set x = {(i, (sj,rj))}. 

2. For every i in the set x, party P2 verifies first that corrij = com(si, r-j). // the verification 
holds it uses the smartcard to compute Di = Pfc(s^) © Ci. 

3. P2 outputs (search, {Di}) where {Di} is the set of documents obtained above. 

Intuitively, the protocol is secure because the only way that P2 can decrypt document Dj is to 
learn Sj. However, by the security of ^basicDB) party P2 can only learn Si when it searches for a 
keyword pj for which pj G Di. This intuition is formalized in a full proof of the following theorem, 
that can be found in [17]. 

Theorem 6 Assume that F is a pseudorandom permutation over {0, l} n and let F be as defined 
in Section 4-2. Then, Protocol 5 securely computes J^oc when Protocol 3 is used in place of the 
trusted party computing ^basicDB- 



6 Conclusions and Future Directions 

We have shown that standard smartcards and standard smartcard infrastructure can be used to 
construct secure protocols that are orders of magnitude more efficient than all previously known 
solutions. In addition to being efficient enough to be used in practice, our protocols have full 
proofs of security under the most stringent definitions of security. No cryptographic protocol for a 
realistic model has achieved close to the level of efficiency of our protocols. Finally, we note that 
since standard smartcards are used, it is not difficult to deploy our solutions in practice (especially 
given the fact that smartcards are become more and more ubiquitous today). 

We believe that this model should be studied further with the aim of bridging the theory and 
practice of secure protocols. In addition to studying what can be achieved in the preferred setting 
where only standard smartcards are used, it is also of interest to construct highly efficient protocols 
that use special- purpose smartcards that can be implemented in Java applets on Javacards. 
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